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ABSTRACT 

The RNA Pol II transcription complex pauses just 
downstream of the promoter in a significant 
fraction of human genes. The local features of 
genomic structure that contribute to pausing have 
not been defined. Here, we show that genes that 
pause are more G-rich within the region flanking 
the transcription start site (TSS) than RefSeq 
genes or non-paused genes. We show that enrich- 
ment of binding motifs for common transcription 
factors, such as SP1, may account for G-richness 
upstream but not downstream of the TSS. We 
further show that pausing correlates with the 
presence of a Grlnl element, an element bearing 
one or more G4 motifs at the 5-end of the first 
intron, on the non-template DNA strand. These 
results suggest potential roles for dynamic G4 
DNA and G4 RNA structures in c/s-regulation of 
pausing, and thus genome-wide regulation of gene 
expression, in human cells. 

INTRODUCTION 

Genome-wide studies have shown that Pol II transcription 
complexes pause just downstream of the transcription 
start site (TSS) at many human genes (1-3). Pausing 
may poise a polymerase for rapid induction of transcrip- 
tion upon receipt of the appropriate signal, or provide a 
checkpoint at which the transcription complex ensures 
that all factors are present for productive elongation. 
Pausing occurs only at a fraction of genes, so one or 
more features of genomic sequence or structure must 



contribute to pausing at human genes. Those features 
have not yet been defined. Identification of the local 
features of DNA architecture that contribute to DNA 
pausing has important implications for understanding 
mechanisms of genomic instability and the response of 
cells to chemotherapeutics. 

G-rich intron 1 (Grlnl) elements are a recently ident- 
ified feature of genomic structure (4). These conserved 
elements are present in almost one-half of all human 
genes and map to the 5' -end of the first intron and the 
non-template strand. They bear the signature sequence 
motif characteristic of regions with potential to form 
G4 structures, G> 3 N X G> 3 N X G> 3 N X G;,3 (5-8). Their 
G-richness cannot be accounted for by sequences that 
would make them targets of well-defined regulatory mech- 
anisms, such as CpG dinucleotides that undergo methyla- 
tion, or motifs recognized by transcription factors or 
RNA processing factors. Grlnl elements occupy a pri- 
vileged genomic position, as they are located on average 
200 nt downstream of the TSS, within 100 bp of the 5' -end 
of the first intron and on the non-template strand. An 
element at this intronic position may regulate transcrip- 
tion or RNA processing without conferring selective 
pressure on protein sequence. 

The position, conservation and abundance of Grlnl 
elements suggest that these elements might function in 
regulation of gene expression. The G-richness of the 
Grlnl element confers the potential to form a dynamic 
structure upon transcription of a genomic region. This 
structure, called a G-loop, carries a co-transcriptional 
RNA/DNA hybrid on the template strand, and G4 
DNA interspersed with single-stranded regions on the 
G-rich non-template (coding) strand (9-12). Persistent 
co-transcriptional RNA/DNA hybrids like those that 
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characterize G-loops can contribute to genomic instability 
(11,13-16). They also prolong the denaturation of the 
DNA strands that normally accompanies transcription, 
enhancing the potential of DNA to form G4 structures 
that may function as regulatory targets. 

Here, we address the possibility that Grlnl elements 
correlate with transcriptional pausing. We show that 
genes that can be classified as paused are more G-rich in 
the region flanking the TSS than RefSeq genes or non- 
paused genes, and we demonstrate that there is a strong 
correlation between transcriptional pausing and the pres- 
ence of a Grlnl element. These results suggest that for- 
mation of G4 structures on the non-template strand 
of the DNA or at the 5'-end of the nascent mRNA may 
promote promoter proximal pausing. Grlnl elements 
may thereby contribute to genome-wide regulation of 
gene expression of specific classes of genes and they may 
also influence cellular sensitivity to drugs that perturb 
the normal dynamics of formation of DNA structure 
during transcription, including topoisomerase poisons 
and compounds designed to target G4 structures. 

MATERIALS AND METHODS 

Sequence data, regulatory motif masking and statistical 
analysis 

Sequence data for the 18 187 human RefSeq genes (NCBI 
36 assembly) were downloaded from the Ensembl 
database 54 using BioMart (17,18). As previously, we 
defined G-richness as the frequency within each set of 
genes of lOOnt sequence that contains a G4 DNA signa- 
ture motif, G> 3 N X G> 3 N X G> 3 N X G>3 (5). Intron sequence 
derivation and calculations of G-richness were performed 
as described (4). For genes that express alternative tran- 
scripts with different first introns, the 5'-most first intron 
was included in the analysis. Masking of regulatory motifs 
was performed as described (4). The / 2 -test was performed 
with the statistics program R version 2.7.1. 

Microarray analysis of NCI-60 lines 

Affymetrix GeneChip Human Exon 1 .0 ST (GH Exon 1 .0 
ST) microarray analysis of NCI-60 cancer cell lines was 
carried out as described previously (8). In brief, micro- 
arrays were hybridized, usually in triplicate, following 
manufacturer's instructions at GeneLogic (Gaithersburg, 
MD), and results normalized by robust multi-array 
analysis (19) using Partek Genomics Suite version 6.3. 
The GH Exon 1.0 ST microarray analysis of the NCI-60 
lines characterized expression of 16959 annotated genes, 
and probes were mapped to transcripts using exon desig- 
nations assigned by SpliceCenter (20). Classification of 
genes as paused or non-paused was based on the differ- 
ence in average probe set intensity level of expression and 
average standard deviation between the first exon and the 
other exons across all cell lines. Probe intensity criteria 
were first developed empirically for the topoisomerase 1 
(TOPI) gene (8), and those criteria were applied to define 
paused genes from the larger database. At TOPI, it was 
noted that exon 1 was expressed both at higher level and in 
a manner less variable than the other exons. The increase 



in expression level of exon 1 was 1.24, and the reduction in 
standard deviation was 0.244. For subsequent analyses, 
paused genes were defined as exhibiting a difference in 
average intensities of the exons 2 through N as 
compared to exon 1 less than —1.24, and an increase in 
the standard deviations of the exons 2 through N as 
compared with exon 1 of greater than 0.244. Genes were 
classified as non-paused if they exhibited a difference in 
average intensities of the exons 2 through N as compared 
to exon 1 >0, and a decrease in average standard deviation 
of less than zero. Using these criteria, 3165 (19%) genes 
were classified as paused and 1401 (8%) as non-paused. 
The remaining genes did not fall into either category. 



RESULTS 

Inverse correlation between Pol II binding and G-richness 

In human genes, two peaks of G-richness flank the TSS, 
centred on the region -100 to +1 and +200 to +300 (4). 
To ask if these peaks of G-richness correlate with binding 
by RNA Pol II, we graphed the frequency of G-richness 
and of Pol II binding sites as determined by Chromatin 
Immunoprecipitation-Sequencing (ChlP-Seq) for human 
T cells (1) in the 2kb region flanking the TSS. The peak 
of Pol II binding, near +100, corresponded to a local 
minimum of G-richness, 200 bp downstream from the 
peak, near +300 (Figure 1). This peak represents the 
average of all Pol II molecules, regardless of pausing 
status. That the peak of Pol II binding coincides with a 
local minimum of G-richness is consistent with the A/T 
richness of most promoters. 

CpG dinucleotides, which are sites for regulatory 
methylation, can contribute to local G-richness. We 
graphed the distribution of CpG dinucleotides in the 
region flanking the TSS, and showed that this comprised 
a relatively broad peak, which is not coincident with the 
peaks of G-richness and lies somewhat upstream of the 
peak of Pol II binding (Figure 1). 




Figure 1. Inverse correlation between Pol II binding and G-richness at 
the TSS. Graph of the frequency Pol II binding sites (1), CpG dinucleo- 
tides, and the frequency of G-richness in the interval —1000 to +1000 
around the TSS. G-richness was defined as the frequency within each 
set of genes of 100 nt sequences containing the G4 DNA signature 
motif, G> 3 N X G> 3 N X G ; .3N X G>3 (4). 
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Promoters of paused genes are enriched in G4 motifs 

We next asked whether G-richness correlates with pausing, 
using three different operational definitions to classify 
genes as paused. One of these definitions distinguishes 
paused and non-paused genes based on relative expres- 
sion of exon 1 and downstream exons, as determined by 
microarray analysis. The NCI-60 panel of cell fines 
includes 60 cell lines representing multiple tumor types 
for which drug sensitivity and transcriptome activity 
have been extensively studied and correlated (8,21-23). 
We calculated the frequency of G-richness in the region 
-1000 to +1000 for the genes classified as paused (19%) 
or not paused (8%) across all cell lines in the NCI-60 
panel database, and for all RefSeq genes. Paused genes 
were more G-rich than RefSeq genes or than non-paused 
genes (Figure 2A). 

A second operational definition identifies paused genes 
as those at which Pol II is stably associated with the TSS 
in the absence of gene expression. Approximately one- 
third of genes in primary resting human CD4 + T cells 
were classified as paused by this criterion (1). We calcul- 
ated the frequency of G-richness of the region flanking the 
TSS of those genes relative to RefSeq genes and 
non-paused genes in that data set. This analysis showed 
that paused genes were more G-rich in the region flanking 
the TSS than other genes (Figure 2B). 

Chromatin marks can also be used to distinguish 
paused and non-paused genes. Histone modifications cor- 
relate with gene expression, with H3K4me3 characterizing 
active genes and H3K27me3 characterizing repressed 
genes. Some genes carry bivalent chromatin marks, with 
H3K4me3 near the promoter and H3K27me3 distributed 
more broadly along the gene and such bivalent marks can 
be used to distinguish paused genes from other genes 
(1,24). Calculation of the frequency of G-richness in the 
region from —1000 to +1000 showed that genes with 
bivalent H3K4me3 and H3K27me3 marks were more 
G-rich than RefSeq genes or inactive genes with monova- 
lent H3K27me3 marks (Figure 2C). 

The above analyses show that paused genes as defined 
by any of the three above criteria are more G-rich than 
non-paused genes. The G-richness of paused genes extends 
throughout the 2kb interval analyzed, and includes 
regions both upstream and downstream of the promoter. 
Sequences upstream of the promoter may contribute to 
pausing by serving as sites for transcription factors that 
promote pausing. In this regard, it is interesting that genes 
classified as non-paused based on relative expression of 
exon 1 and downstream exons were comparatively 
G-poor (Figure 2A). This raises the possibility that tran- 
scription factors with G/C rich binding motifs may con- 
tribute to pausing at some genes, or conversely that 
transcription factors with A/T rich binding motifs may 
prevent pausing at others. 

Strand biased G-richness downstream of the TSS at 
paused genes 

The results above (Figure 2) establish that paused genes 
are more G-rich than other genes. How might G-richness 
contribute to the mechanism of pausing? Pausing could in 



45% 

— . 40% 
> 

C 35% 
01 

= 30% 

s 

25% 

ft 20% 
01 

15% 

u 

■C 10% 

6 

5% 





— Paused 

— RefSeq 
Non-paused 






-A 


















m 


^+ 















B 



45% 



Paused 

RefSeq 

Non-paused 




45% 

—.40% 
g" 

C 35% 

01 

§■30% 
01 

£,25% 
in 



15% 
10% 
5% 
0% 





H3K 

Ref 

H3h 


4me3,H3K27me3 
Seq 

C27me3 



































o 
o 

CO 



Figure 2. G4 motifs are enriched near promoters of paused genes. 
(A) Graph of the frequency of G-richness in genes defined as paused 
from the NCI-60 database in the interval -1000 to +1000 around the 
TSS. (B) Graph of the frequency of G-richness in genes in which Pol II 
is stably associated with the TSS in the absence of gene expression in 
the interval —1000 to +1000 around the TSS. This data set derives from 
analysis of primary resting human CD4 + T cells (1), and corresponds to 
the same data set for which genome-wide analysis Pol II position is 
presented in Figure 1. (C) Graph of the frequency of G-richness in the 
interval —1000 to +1000 around the TSS in genes carrying bivalent 
chromatin marks H3K4me3 and H3K27me3, as determined by 
analysis of primary resting human CD4 + T cells (1). 
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principle be caused by formation of G4 structures in either 
the DNA or the nascent transcript. If G-loop formation 
contributes to the mechanism of pausing, then G-richness 
of paused genes is predicted to exhibit a strand bias, with 
G-rich regions downstream of the TSS concentrated in the 
non-template strand (9,10,12). We therefore compared the 
frequency of non-template and template strand G-richness 
in the 2 kb region spanning the TSS for genes classified as 
paused and non-paused based on relative expression of 
exon 1 and downstream exons in the NCI-60 database, 
and for all RefSeq genes. For all three groups of genes, 
there was clear strand asymmetry in G-richness down- 
stream of the TSS, with greater G-richness on the non- 
template strand. Notably, paused genes were more G-rich 
than RefSeq genes, which were more G-rich than 
non-paused genes (Figure 3). 

G-richness of the genes analyzed, exhibited a character- 
istic distribution. For all three groups, more genes were 
G-rich on the non-template strand than on the template 
strand. 

For all three groups of genes, upstream of the TSS and 
on the non-template strand, the maximum frequency of 
G-richness fell within the region from —100 to —1, where 
40% of paused genes were G-rich, compared with 22% 
of non-paused genes and 30% of all RefSeq genes. 
Downstream of the TSS and on the non-template 
strand, maximum frequency of G-richness fell within the 
region from +200 to +300, where 42% of the paused genes 
were G-rich, compared with 28% of the non-paused genes 
and 35% of the RefSeq genes. Downstream of the TSS 
and on the non-template strand, a peak in the frequency of 
G-richness was also evident among paused and RefSeq 
genes, but not non-paused genes. 

Transcriptional regulatory motifs account for some but 
not all G-richness near the TSS of paused genes 

G-richness can reflect the presence of DNA sequence 
elements with well-characterized functions, including 
CpG dinucleotides that are targets of methylation as 
well as motifs for some common transcription factors that 



recognize G-rich sites in duplex DNA, including SP1 
(RGGCGKR), KLF (GGGGTGGGG), EKLF (AGGG 
TGKGG), MAZ (GGGAGGG), EGR-1 (GCGTGGGC 
G) and AP-2 (CGCCNGSGGG). To eliminate contribu- 
tions from these elements, we analyzed the distribution of 
G-richness with these sites masked. The frequency of 
G-richness may be greatly underestimated following 
masking, because masking is carried out based on DNA 
sequence alone, independent of information on whether a 
motif actually serves as a binding site for its cognate 
factor. Moreover, in the absence of knowledge regarding 
whether a specific motif contributes to pausing, masking 
may even eliminate from the tally genes bearing a motif 
that promotes pausing. Nonetheless, masking provides a 
convenient view of how canonical motifs affect the 
genomic landscape. 

We first masked SP1 binding motifs, separately ana- 
lyzing all RefSeq genes and the paused and non-paused 
genes identified in the NCI-60 database. Eliminating SP1 
motifs primarily affected the region upstream of the TSS, 
eliminating the peak of G-richness upstream of the TSS in 
the non-template (but not template) strand of all three sets 
of genes (Figure 4A). Downstream of the TSS, G-richness 
of the non-template strand for paused genes was still 
greater (37%) than for non-paused genes (24%) or 
RefSeq genes (31%). 

Masking of CpG dinucleotides reduced the frequency 
of G-richness both upstream and downstream of the 
TSS in all three sets of genes (Figure 4B). Even after 
masking, there was clear strand asymmetry in G-richness 
downstream of the TSS for all groups of genes. In 
addition, G-richness of paused genes remained greater at 
both upstream and downstream peaks (25 and 24%, re- 
spectively) than G-richness of non-paused genes (15 and 
20%) or RefSeq genes (19 and 22%). Thus, although 
CpG content corresponded with pausing in both 
upstream and downstream regions, it did not account 
for all of the G-richness surrounding the TSS. We note 
that a peak of G-richness upstream of the TSS that 
was eliminated by masking SP1 motifs (Figure 4A) 
persisted after masking CpG motifs (Figure 4B), 
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Figure 3. Strand-biased G-richness downstream of the TSS at paused genes. Graph of the frequency of G-richness of the non-template (dark lines) 
and template (pale lines) strands in the interval —1000 to +1000 around the TSS for genes in the NCI-60 database classified as paused (left) or 
non-paused (center), and RefSeq genes (right). 
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Figure 4. Transcriptional regulatory motifs do not account for G-ricliness near TSS of paused genes. Graph of the frequency of G-richness of 
non-template (dark lines) and template (pale lines) strands in the interval —1000 to +1000 around the TSS for paused genes (left), non-paused genes 
(centre) and all human RefSeq genes (right), with the following motifs masked: (A) SP1 motifs. (B) CpG motifs. (C) CpG, SP1, MAZ, KLF, EKLF, 
EGR-1 and AP-2 motifs. 



suggesting that SP1 motifs make their primary contribu- 
tion to G-richness upstream and not downstream of 
the TSS. 

Finally, we maximally depleted common G-rich motifs 
by masking binding motifs for six common transcrip- 
tion factors that bind G-rich sites, including SP1, KLF, 
EKLF MAZ, EGR-1 and AP-2, as well as CpG motifs. 
To maximize depletion of these canonical motifs, they 
were masked before eliminating CpG motifs. This strin- 
gent masking diminished the peaks of G-richness 
upstream and downstream of the TSS in all three classes 



of genes, but affected the upstream peak most profoundly 
(Figure 4C). Following stringent masking, the strand 
asymmetry in G-richness downstream of the TSS per- 
sisted, although only small differences were evident in 
non-template strand G-richness at both upstream and 
downstream peaks of paused genes (11 and 17%, respect- 
ively) relative to non-paused genes (7 and 14%, respect- 
ively). The very high stringency of masking is likely to be 
responsible for this considerable decrease in frequency of 
G-richness, and these small differences are unlikely to be 
significant. 
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Grlnl elements correlate with pausing 

We previously found that almost one-half of all human 
genes contain G-rich elements on the non-template DNA 
strand at the 5'-end of the first intron, referred to as Grlnl 
elements (4). To ask if a difference in Grlnl element fre- 
quency characterizes paused and non-paused genes, we 
calculated G-richness for 1000 bp of the first introns for 
RefSeq genes and genes classified as paused or non-paused 
in the NC1-60 database. (This analysis was restricted to 
introns at least 1000 bp in length in order to include a 
constant number of genes along the length distribution. 
We previously showed (4) that setting the lower limit 
of intron size to either 100 bp or 1000 bp generates an 
essentially identical distribution of G-richness.) A 
fraction of genes in all three groups exhibited a peak of 
G-richness at the very 5'-end of the first intron, consistent 
with the presence of a Grlnl element (Figure 5A). Grlnl 
elements were present in 57% of paused genes, 38% of 
non-paused genes and 50% of RefSeq genes. The differ- 
ence between the fraction of paused and non-paused genes 
containing Grlnl elements was highly significant (/ 2 = 82; 
P< 10" 10 ). 

Motifs for hnRNP proteins and CpG dinucleotides 
contribute to but do not account for Grlnl elements 

Two hnRNP proteins involved in RNA processing recog- 
nize motifs containing runs of three or more guanines in 
single-stranded DNA or RNA, hnRNP A (UAGGGU/A) 
and hnRNP H (GGGA) (25,26). These motifs contribute 
to but are not sufficient for binding, so the tally of motifs 
will overestimate their functional contribution of to 
G-richness of the intron. After masking these motifs, 
34% of paused genes, 19% of non-paused genes and 
28% of RefSeq genes retained a peak of G-richness, dif- 
ferences comparable with those observed upon analyzing 
the unmasked genes (Figure 5B). Masking CpG motifs in 
addition to hnRNP A and H binding motifs reduced the 
frequency of G-richness at the 5'-end of intron 1, so that a 
peak of G-richness was evident in 19% of paused genes, 
13% of non-paused genes and 17% of RefSeq genes 
(Figure 5C). Thus, even with all these motifs masked, 
there was a greater frequency of Grlnl elements in 
paused genes than in other gene classes. 

DISCUSSION 

We have identified a correlation between G-richness near 
the TSS and pausing in human genes. This correlation 
emerged from a genome- wide analysis, which examined 
genes classified as paused in the NCI-60 panel of cell 
lines or in primary resting T cells. The analysis defined 
pausing by three different operational criteria: relative 
levels of transcripts from exon 1 and downstream exons; 
association of Pol II with the TSS in the absence of tran- 
scription; and bivalent histone marks. Downstream but 
not upstream of the TSS, G-richness of paused genes 
was biased to the non-template DNA strand. G-rich con- 
sensus recognition motifs for sequence-specific DNA or 
RNA binding proteins, or of CpG dinucleotides, ac- 
counted for some but not all G-richness of paused 
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genes. We emphasize that while the correlation between 
G-richness and pausing was strong, it did not apply to all 
genes. Additional mechanisms undoubtedly contribute to 
pausing and G-richness is likely to be only one of the 
many factors that modulate pausing at any given gene. 
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The correlation between pausing and G-richness was 
particularly apparent at the 5'-end of the first intron, 
where paused genes proved significantly more likely to 
carry Grlnl elements, defined as at least one G4 motif 
within the first 100 bp of the first intron, on the non- 
template DNA strand (4). Grlnl elements characterized 
57% of paused genes and only 38% of non-paused genes. 
The genomic position of Grlnl elements is consistent with 
a possible role in promoter-proximal pausing. Grlnl 
elements lie at the very 5'-end of the first intron, or ~ 
200-300 bp downstream of the TSS, as the median 
distance from the TSS to the 5'-end of the first intron is 
198 bp for human genes and Grlnl elements are about 
100 bp in length. Promoter-proximal pausing occurs in 
the region +20 to +50 relative to the TSS (2). A regulatory 
element 200-300 bp downstream from the TSS could 
readily communicate with Pol II or other components of 
the transcription apparatus, to cause Pol II to pause. 

G4 motifs and the mechanism of pausing 

The correlation between G4 motifs and pausing suggests 
that dynamic structures formed upon transcription of a 
region bearing G4 motifs may contribute to regulation 
of pausing in cis. Figure 6 illustrates those structures, 
which may promote pausing by distinctive mechanisms: 
(i) A G4 DNA structure formed behind the advancing 
polymerase may be recognized by factors that regulate 
pausing, which in turn cause polymerase to pause. A 
compelling precedent for a ri.v-regulatory role for G4 
DNA has recently been provided by evidence that G4 
DNA formation controls pilin gene antigenic variation 
in Neisseria gonorrhea (27). In addition, the human 



TOPI gene has recently been found regulated by pausing 
in the first intron at conserved G4 DNA elements (8). 
Alternatively, a G4 structure in the DNA might serve as 
a roadblock to an advancing polymerase, suggested by 
in vitro analysis of transcription on G-rich templates 
(28), as well as evidence that G4 motifs can block progres- 
sion of DNA polymerase or even the translation machin- 
ery (29-31). In that case, pausing would not occur during 
the first round of transcription, but after a 'pioneering' 
round of transcription that enabled a G4 DNA structure 
to form, (ii) A G4 RNA structure in the 5'-end of the 
nascent transcript may communicate a pause to the tran- 
scription apparatus. This mechanism of pausing has been 
extensively documented in prokaryotes, where RNA 
hairpins interact with the polymerase complex to 
promote pausing at specific sites (32). In human cells, 
the Trans-Activating Response (TAR) element of the 
HIV-1 retrovirus has been shown to form a stem-loop 
structure recognized by Trans-Activator of Transcription 
(TAT) and associated factors to promote transcription 
(33). (hi) A stable co-transcriptional RNA/DNA hybrid 
may communicate a signal for pausing via the RNA pro- 
cessing apparatus or the transcription apparatus. Single 
molecule imaging has provided dramatic evidence of 
how co-transcriptional RNA/DNA hybrids can contrib- 
ute to 'pile-ups' of Pol I actively transcribing the G-rich 
rDNA in budding yeast (34). 

Polymerase pausing is transient (35) and specific regu- 
latory mechanisms may enable a polymerase to exit the 
paused state. A polymerase that pauses upon encountering 
a G4 structure could resume transcription upon elimin- 
ation of that structure, e.g. by a G4 helicase; or if the 
polymerase/G4 interaction was interrupted by another 
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Figure 6. Regulation of transcriptional pausing at G4 motifs. Model of dynamic nucleic acid structures that may contribute to pausing upon 
transcription of a G-rich region. Mechanisms that contribute to pausing may include: (i) G4 DNA formed behind an advancing polymerase may 
be recognized by factors that promote pausing, (ii) G4 DNA structure formed in a 'pioneering' round of transcription may serve as a roadblock 
during the next round of transcription, (iii) a G4 RNA structure in the nascent transcript may communicate a pause to the transcription complex, as 
occurs in prokaryotes and (iv) a stable co-transcriptional RNA/DNA hybrid may promote pausing, via signals transmitted through the RNA 
processing apparatus. 
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factor. In this regard, it is interesting that the hnRNP 
proteins which interact with RNA in the nucleus contain 
structural domains (RRM/RBD domains or RGG 
domains) that recognize and may destabilize G4 structures 
(36), raising the possibility that they may compete with 
components of the transcription apparatus for binding 
to G4 structures. 

No single mechanism is likely to account for pausing at 
every gene. Moreover, the genome-wide analysis that we 
carried out does not show that all genes that pause carry 
Grlnl elements; or that Grlnl elements are simple iden- 
tifiers of genes that pause. Nonetheless, the model in 
Figure 6 should provide a useful starting point for future 
experiments that elucidate the mechanism of pausing at 
individual genes and classes of genes. 

G-richness and genomic instability in 
AID-expressing tumors 

We have previously shown that G-rich regions are targets 
of translocations in B cell lymphomas that express the 
DNA deaminase, AID, although not in T cell leukemias, 
which do not express AID (11). AID associates with a 
pausing factor, Spt5 (37). The connection we have estab- 
lished between G-richness and pausing suggests that Spt5 
may recruit AID to G-rich paused regions to initiate in- 
stability. High levels of AID expression characterize 
ovarian, breast and prostate malignancies (38) as well as 
B cell lymphomas. Our results suggest that G-rich sites of 
pausing may also be targeted for instability in those tumor 
types. 

G4 motifs and drug sensitivity 

A role for G4 structures in polymerase pausing has impli- 
cations for improved understanding of the mechanisms of 
several classes of drugs, including G4-binding small 
molecule ligands, G4 aptamers and topoisomerase I 
poisons. Small molecules that target G4 structures are cur- 
rently in active development, with telomeres and rDNA as 
specifically prominent targets (39^12). Our results suggest 
that interactions with transcription-induced structures 
may contribute to both the effects and side effects of 
these drugs. G4 aptamers have also shown promise in 
treatment of cancer, but their mechanism of action is 
complex (43). Our results raise the possibility that 
transcription-induced G4 structures may compete with 
aptamers for binding key factors, thereby causing un- 
anticipated off-target effects. This could, for example, 
explain cell type specificity of some aptamers, as binding 
competition would be determined by the genes expressed 
in a given cell type. 

Camptothecin, a topoisomerase I poison, is the proto- 
type for an important class of cancer chemotherapeutics 
(44). Treatment of cells with camptothecin has been shown 
to diminish Pol II pausing (45), an observation which can 
be explained in terms of the model shown in Figure 6. 
Formation of co-transcriptional RNA/DNA hybrids 
is very sensitive to local superhelicity (16,34,46,47). 
Camptothecin treatment prolongs the half-life of the cova- 
lent topoisomerase I/DNA intermediate on the DNA, 
and may thereby diminish not only local superhelicity 



but also stability of the local structure containing a 
co-transcriptional hybrid that promotes pausing. This 
will contribute to reducing pausing at a subset of genes 
in camptothecin-treated cells. In this regard, it is interest- 
ing that the TOPI gene, which encodes topoisomerase I, 
carries a Grlnl element and is itself regulated by transcrip- 
tional pausing (8), which may render TOPI expression 
sensitive to local superhelicity, and to camptothecin. The 
effect of camptothecin on transcript levels is likely to differ 
from gene to gene, depending on details of local regulation 
of gene expression and DNA architecture. 
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